Skip to content

Conversation

@georgerennie
Copy link
Collaborator

This adds the $barrier cell type which represents an optimization barrier that Yosys should not optimize through. Changes include:

  • Added the cell to the kernel
  • Added support in some of the common backends, treating $barrier as just a buffer. I also added $buf support while I was at it where it didn't exist.
  • Updated opt_merge to ignore $barrier
  • Added -barriers to flatten that will insert barriers to drive all public wires driven by the flattening
  • Added optbarriers command to insert/remove barriers everywhere that public wires are driven. This has the following options:
    • -nocells - don't add barriers for the outputs of cells
    • -noprocs - don't add barriers for the outputs of processes
    • -private - add barriers for private wires as well as public
    • -remove - convert selected barriers back to connections
  • Added -barriers flag to prep that calls optbarriers at the start and passes -barriers to flatten. This should be a drop in replacement for a regular prep call that has the benefit of maintaining all the structure among public wires.

This still needs tests and probably some discussion about what the flags/defaults should be as currently this is very conservative on the optimizations allowed. I wasn't sure exactly what optimizations people would want to opt into but attributes can be added to $barrier to do so.

@georgerennie georgerennie force-pushed the george/optimization_barriers branch from 767981a to 33d5138 Compare November 20, 2024 17:58
@whitequark
Copy link
Member

Is there a background discussion on this cell somewhere? I'm interested in it as it's similar to a $freeze cell (named by analogy to the proposed LLVM instruction) that I've suggested about two years ago.

@georgerennie
Copy link
Collaborator Author

georgerennie commented Nov 20, 2024

I brought it up at the Dev JF on monday but we didn't write down too much about the conclusions. The gist of it is that there are some applications for Yosys where it is more important to retain some amount of the structural information rather than just the semantic. This has caused issues like #3426 in the past and has given me similar issues recently where I want to structurally rewrite the drivers for some nodes which causes semantic changes to the design for formal verification. The alternative approach to something like optimization barriers is to not do the optimizations in the first place, but in my experience there tends to be lots of logic involved with private wires that cause a blowup in design size and can be cleaned away straightforwardly, the structure with respect to public wires is usually all that matters.

Am open to any suggestions/critiques about how to go about this, this is just a draft for one way it can be done.

@georgerennie
Copy link
Collaborator Author

I think I may need a bit more care with how procs get rewritten, there are some unhelpful interactions like the below

read_verilog <<EOT
module top(input wire clk, input wire rst, input wire [7:0] a, output reg [7:0] b);

always @(posedge clk or posedge rst) begin
	if (rst)
		;
	else
		b <= a;
end

endmodule
EOT

optbarriers
proc
clean
dump

without optbarriers this generated a $dff as expected, with it it generates an $aldff. This is because proc_arst rewrites this to

if (rst)
    barrier_in <= barrier_out;
else
    barrier_in <= a;

proc_dff can normally see that barrier_in and barrier_out are the same wire and so optimizes that case away into b <= rst ? b : a, but with the barrier it can't see that the two are the same.

I think the right solution to this is that as well as rewriting signals to come from a barrier when they appear on the lhs of assignments in processes, any times a signal appears on the rhs of an assignment in the same process in which it is assigned in an lhs we should rewrite the rhs to use the pre-barrier version of the signal. Maybe this should only be done for the default assignment at the top of the process though? I haven't quite worked this out yet (putting it here so I remember what the issue was when I come back to this).

@jix
Copy link
Member

jix commented Nov 23, 2024

Is there a background discussion on this cell somewhere? I'm interested in it as it's similar to a $freeze cell (named by analogy to the proposed LLVM instruction) that I've suggested about two years ago.

Since variants of this were brought up independently and for different use cases, I made an attempt now to summarize the motivating use cases that I'm aware of and how it would apply there:

  • Controlling 'x propagation during synthesis
    • Configured as a barrier that is transparent for values that are statically known to be always fully defined and opaque otherwise
    • Ensures that treating an 'x input-bit as a don't care and replacing it with an arbitrary value results in a consistent choice for all users as the barrier will inhibit any x-propagation or other optimization peeking across it until such a choice is fixed
    • @whitequark suggested this 5 years ago already: class Memory: add reset_less parameter that will not initialize memories with 0 value in generated RTL. amaranth-lang/amaranth#270 (comment) (2nd to last paragraph of the comment)
    • This could be done in such a way that a single potentially-'x bit makes the full barrier opaque or it could be treated as a bitwise barrier
    • It's not clear to me whether and how this would be generated automatically / implicitly, but I think the answer to that question wouldn't affect the requirements for the cell itself
      • The 1364.1 standard for verilog RTL synthesis (that was never updated for SV) restricts the use of x to instances where it is directly assigned to a signal. It also states "The value x shall not be used with any operators or mixed with other expressions.". It also separately states that for synthesis making an x assignment will result in that signal being treated as a don't care. I'd say 1364.1 doesn't clearly specify the intended overall behavior and is open to interpretation, but my interpretation would be that the idea was to allow direct x assignments to signals for don't cares but require that they are resolved at the point of assignment as that's the only way to ensure that x values do not end up being "used with any operators"
      • In practice people expect to be able to use x without such restriction, and, since the restriction is what motivated the interpretation that x has to be resolved at the point of assignment, I have no idea whether treating assignments as resolving don't cares is still reasonable
    • In this configuration it is the same idea as LLVM's freeze
      • That goes back to the 2017 paper "Taming Undefined Behavior in LLVM"
      • The language reference for LLVM's freeze instruction doesn't mention partially undef values, but my understanding is that it acts bitwise on partially undef values and there being no such thing as a partially poison scalar value (but I only have superficial knowledge of LLVM)
  • Provide consistent 'x-as-unconstrained behavior during 2-valued formal verification
    • The default behavior for FV is to use 2-valued logic with no x-propagation where any 'x value is treated as an unconstrained value (converted to $anyseq etc. during the FV preparation flow)
    • Currently the FV flow depends on prep calling opt -keepdc to prevent selecting arbitrary concrete values for don't care bits before the part of the flow that's run by SBY has a chance to convert x constants to $anyseq
      • This is quite fragile, I discovered multiple places in opt passes that were missing an explicit check for keepdc and I'm not super confident there aren't any left that are just harder to hit
      • It's much easier to miss a check for a -keepdc option by accident (especially if it was added as an afterthought) than accidentally traversing a barrier.
      • Operators that can produce 'x bits from fully defined values (division, etc) are not consistently treated as producing unconstrained values in that instance
      • It doesn't prevent cases where the frontend already does x-propagation
      • It also doesn't prevent duplicating x values as that happens with assign a = 'x; assign b = a and running clean already
    • The idea to properly fix this would be to
      • Insert barriers around any x constant as early as possible (would need frontend changes), configured for the same freeze-like behavior as in the previous synthesis use case
      • For cells that can produce x values from fully defined inputs add the same kind of barrier behind the output and eventually insert a mux to $anyseq for input values where the output is undefined. The barrier makes it safe to run other optimizations in between that might even get rid of the cell
      • Since elaboration of parameters requires evaluating potentially x containing values, I'd argue that the frontend should still do x-propagation for operators where all operands are constants. Importantly this does not include simplifying a + 'x to 'x when a is a signal even when there is an assign a = 'x nearby.
        • This would ensure consistent behavior when moving subexpressions between a signal vs a constant context (as an expression referring to a signal cannot be used in a constant context).
  • Preserving the module structure in a flattened design
    • For public signals we can already preserve the structure using hdlname and $scopeinfo, but for synthesis flow we might not want to or even be able to preserve all internal public signals
    • Flatten would insert barriers in place of flattened module ports
    • Different use cases for this might use different barrier configurations, depending on the amount of cross-module optimization that is wanted
      • Preventing all optimizations can be useful if the goal is to export to some backend that doesn't support submodules (e.g. BTOR2) while still being able to attribute every primitive to its containing module
      • Even when the module hierarchy is to be fully preserved in the sense that every post-synthesis cell being attributable to its source module, it would still be reasonable to allow removing unused logic, constant propagation or even optimizations that rewrite logic locally but make use of invariants derived from the input and/or output cone crossing a module boundary
      • More extreme but still precisely preserving the boundary would be to allow optimizations that re-encode signals crossing a boundary, similar to how FSM reencoding works
  • (Selectively) preserving the connectivity structure within a module
    • Manually with a (* dont_touch *) attribute that's stronger than (* keep *)
      • Whereas (* keep *) ensure that logic equivalent to the signal is kept, (* dont_touch *) ensures that all uses of the signal will use the kept logic
      • This requires making the signal fully opaque for all uses, i.e. corresponds to inserting a full optimization barrier
    • Automatically for analysis and/or verification methods that need to preserve the connectivity structure of the input design
      • Reasons for this that I can think of are methods that
        • report results back in terms of paths in the design or check a property specified in terms of that
        • derive useful heuristics from the connectivity structure of the design
        • perform some analysis using per-primitive approximations where performing logically valid optimizations that change the connectivity structure would still affect the quality of the results in ways that are hard to understand and/or control for the user
      • The appropriate kind of barrier to use might vary between different analyses
      • I believe @georgerennie's use case would also fit into this category

@whitequark
Copy link
Member

Thanks for such a thorough analysis, @jix!

@mattyoung101
Copy link
Contributor

I will also chime in and add that, if I'm understanding this pull correctly, this would be an incredibly useful feature for my project. I'm working on a triple modular redundancy plugin, and one major unresolved problem is passes like opt_share identifying the redundant logic as redundant (which it technically is!) and optimising it out.

If I could instead annotate cells with $barrier and have Yosys avoid optimising them, this would be a major help and mean I can move the TMR pass much earlier in the design pipeline and have better control of granularity. Currently, it has to run very late, after all the opt passes have completed.

@georgerennie georgerennie force-pushed the george/optimization_barriers branch 2 times, most recently from f864b7a to 6532cf6 Compare December 2, 2024 19:14
@georgerennie
Copy link
Collaborator Author

georgerennie commented Dec 2, 2024

Have fixed the process issue but it adds a reasonable bit more complexity. The idea is that a path through a set of assignments in a process from a variable to itself should see the pre-barrier version of itself, as this is needed in proc_dff to catch where an async rule doesn't change an FFs value. We do this by finding such cyclic paths through the assignment graph and duplicating them out to have a copy that is identical except its leaves see the pre-barrier version of the variable.

I have an example on a real design that matches the pattern I commented earlier in the thread where this more careful rewriting is needed to prevent an $aldff being inferred that fails check due to a comb path from the output back to the aload input. This is also helped by #4781, but obviously is primarily relevant for the verilog frontend.

On a reasonable sized design running proc; opt_clean I get the following stats running optbarriers first or not (only showing the cell counts that differ and flops:

   Number of wires:              76444
   Number of wire bits:         679737
   Number of public wires:       23528
   Number of public wire bits:  308522
   Number of ports:              14944
   Number of port bits:         172522
   Number of memories:               5
   Number of memory bits:          880
   Number of processes:              0
   Number of cells:              64408
     $adff                        1746
     $and                         2956
     $barrier                    16413
     $dff                            8
     $dlatch                       256
     $eq                          9056
     $logic_not                   2337
     $mux                        20295
     $not                         2788
     $or                          1928
     $pmux                        2431
   Number of wires:              62813
   Number of wire bits:         495929
   Number of public wires:       24379
   Number of public wire bits:  314651
   Number of ports:              14944
   Number of port bits:         172522
   Number of memories:               5
   Number of memory bits:          704
   Number of processes:              0
   Number of cells:              47385
     $adff                        1746
     $and                         2836
     $dff                            8
     $dlatch                       256
     $eq                          9044
     $logic_not                   2303
     $mux                        19912
     $not                         2736
     $or                          1920
     $pmux                        2430

The flop/latch inferences are the same (which is not true without the more complex process rewriting) and the rest of the numbers seem to be in the reasonable ranges for what id expect, although I haven't done a proper equivalence check or anything yet and this could probably do with tests. Not sure when I will get much time to add tests at the moment. Other than that I think the logic should be done so I will mark as ready for review for now.

@georgerennie georgerennie marked this pull request as ready for review December 2, 2024 19:39
Copy link
Member

@povik povik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have reviewed all the bits outside optbarries.cc, I think we need to fix the bmt2 $buf handling

register_reverse_wire_map(y_id, cell->getPort(ID::Y));
continue;
}
if (cell->type == ID($pos)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's strange this interprets $pos without looking at the A_SIGNED parameter, but if there's a bug it's pre-existing

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I think its fine to leave as is for now


if (cell->type == ID($not)) return export_bvop(cell, "(bvnot A)");
if (cell->type == ID($pos)) return export_bvop(cell, "A");
if (cell->type.in(ID($pos), ID($buf), ID($barrier))) return export_bvop(cell, "A");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this will crash when given a $buf or $barrier as those don't have A_SIGNED

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, have updated export_bvop to only check for A_SIGNED if it exists

kernel/satgen.cc Outdated
}

if (cell->type.in(ID($pos), ID($buf), ID($neg)))
if (cell->type.in(ID($pos), ID($buf), ID($barrier), ID($neg)))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think by supplying a SAT model for $barrier to the optimization passes you are running the risk of barrier leakage. I would make the model opt-in, not sure how to do it best though

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure about how to opt in - had added this so I could write equiv_opt tests but we have no easy way of separating "you can use this in sat models for verification" from "you can use this in sat models for optimization". I've removed this for now, as well as the similar implementation in cellaigs.cc because that meant barriers could be lost when running aigmap.

log(" 'hdlname' attribute.\n");
log("\n");
log(" -barriers\n");
log(" Use $barrier cells to connect flattened modules to their surrounding\n");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"introduce optimization barriers when a flattened module drives a public wire" ?


for (int i = 0; i < GetSize(new_conn.first); i++) {
const auto lhs = new_conn.first[i], rhs = new_conn.second[i];
auto& sigsig = !lhs.is_wire() || !lhs.wire->name.isPublic() ? skip_conn : barrier_conn;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think lhs.is_wire() shouldn't happen here, we could add an assert

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Err, !lhs.is_wire()

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh yes I had thought you could have this with something like tristates, but I double checked and such a scenario does get caught by the check a few lines earlier and thrown as an error, so it is fine to assume its non-const here. Even though its checked just before I'll add an assertion in case the two ever end up separated by refactoring.

@georgerennie georgerennie force-pushed the george/optimization_barriers branch from 3ed98d6 to 784751f Compare July 10, 2025 14:37
@georgerennie
Copy link
Collaborator Author

Am going to take a look at this again - life got busy and it slipped off my todo list to finish responding to reviews here. Have just rebased it onto main and will address the comments + add some tests now.

@georgerennie georgerennie force-pushed the george/optimization_barriers branch from 784751f to 9dc6f5d Compare July 10, 2025 14:44
@georgerennie
Copy link
Collaborator Author

georgerennie commented Jul 14, 2025

As to a discussion I had with @widlarizer about whether this meets the requirements for something like LLVM's freeze, we think it doesn't. The semantics of the two are as follows:

  • barrier: users should not look through the barrier, so must assume it can take any value such as 0/1/x. because it must be assumed that it can take non x values, multiple uses of the output of the barrier should all see the same value

  • freeze: users can assume it symbolically takes a concrete value (0/1), and are allowed to look through it. this means that when the input is x, users can apply optimizations that require it to take a concrete value. multiple uses of the output of the freeze should all see the same value.

The corrolary of this is that a barrier implies freeze but is stronger (it prevents reasoning based on the input value). If we want the freeze behaviour this can be added later either with a separate cell or maybe an attribute/parameter of $barrier.

@jix
Copy link
Member

jix commented Jul 14, 2025

The corrolary of this is that a barrier implies freeze but is stronger (it prevents reasoning based on the input value). If we want the freeze behaviour this can be added later either with a separate cell or maybe an attribute/parameter of $barrier.

That is consistent with what I've been thinking of.

When I was writing my comment above I was considering a $barrier that is fully opaque by default (i..e. prevents all reasoning through the barrier, basically treating the barrier input as a primary output and the barrier output as a primary input), but can be configured to be weaker.

The first bullet point of that talks about a variant that is transparent only if the input is statically known to be non-x (let's call that configuration an x-prop-blocking $barrier), which would give it freeze like behavior for optimization passes (in the sense that any refinement that turns an x into a 0/1 will be consistent across all uses of the x value since the barrier prevents x-propagation or any other optimization that could make globally inconsistent choices for x).

If you combine the x-prop-blocking $barrier with something like setundef -anyseq (or rather a variant of it that will also work with dynamic x values, not just constants, what I described in the second bullet point above) you'd end up with freeze behavior where there's a consistent symbolic choice for each x that occurs.

@georgerennie georgerennie force-pushed the george/optimization_barriers branch from 9dc6f5d to 3bcca55 Compare July 15, 2025 15:35
Copy link
Collaborator Author

@georgerennie georgerennie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have addressed the review comments and added some basic testcases.

I also did a best effort test that this didn't seem to break the changed backends with the following script that tries to output picorv32 with and without barriers - some of that is a bit verified by hand but it seems to give reasonable results.

read_verilog picorv32.v
prep -top picorv32 -barriers -flatten
pmuxtree
autoname
select -assert-any t:$barrier

design -push-copy

# Write out with barriers
write_verilog barriers.v
write_smt2 barriers.smt2
write_btor barriers.btor2
write_firrtl barriers.firrtl

optbarriers -remove
select -assert-none t:$barrier

# Write out without
write_verilog nobarriers.v
write_smt2 nobarriers.smt2
write_btor nobarriers.btor2
write_firrtl nobarriers.firrtl

exec -expect-return 0 -- bash -c "diff <(sort barriers.v) <(sort nobarriers.v)"
# The amount of concats/slices changes slightly but I don't think this is a big
# deal, its to do with how the backend exports ranges of signals I think
exec -expect-return 0 -- bash -c "diff <(awk '{print \$2}' barriers.btor2 | sort | rg -v '(concat|slice)') <(awk '{print \$2}' nobarriers.btor2 | sort | rg -v '(concat|slice)')"

# Firrtl export seems to be kinda broken in general so these aren't the same,
# but from looking at the diff the changes don't seem to be due to the barriers
# exec -expect-return 0 -- bash -c "diff barriers.firrtl nobarriers.firrtl"

# Synth down to aig and do the same
design -pop
# slightly random commands to create some kind of aig without flops,
# doesn't matter too much what happens here
expose -evert-dff
delete t:$dff
synth
delete t:$_DFFE_PP_
aigmap
abc -g aig
aigmap
stat
write_aiger2 barriers.aig

optbarriers -remove
select -assert-none t:$barrier
write_aiger2 nobarriers.aig

exec -- abc -c "cec -v nobarriers.aig barriers.aig"


for (int i = 0; i < GetSize(new_conn.first); i++) {
const auto lhs = new_conn.first[i], rhs = new_conn.second[i];
auto& sigsig = !lhs.is_wire() || !lhs.wire->name.isPublic() ? skip_conn : barrier_conn;
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh yes I had thought you could have this with something like tristates, but I double checked and such a scenario does get caught by the check a few lines earlier and thrown as an error, so it is fine to assume its non-const here. Even though its checked just before I'll add an assertion in case the two ever end up separated by refactoring.

register_reverse_wire_map(y_id, cell->getPort(ID::Y));
continue;
}
if (cell->type == ID($pos)) {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I think its fine to leave as is for now


if (cell->type == ID($not)) return export_bvop(cell, "(bvnot A)");
if (cell->type == ID($pos)) return export_bvop(cell, "A");
if (cell->type.in(ID($pos), ID($buf), ID($barrier))) return export_bvop(cell, "A");
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, have updated export_bvop to only check for A_SIGNED if it exists

kernel/satgen.cc Outdated
}

if (cell->type.in(ID($pos), ID($buf), ID($neg)))
if (cell->type.in(ID($pos), ID($buf), ID($barrier), ID($neg)))
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure about how to opt in - had added this so I could write equiv_opt tests but we have no easy way of separating "you can use this in sat models for verification" from "you can use this in sat models for optimization". I've removed this for now, as well as the similar implementation in cellaigs.cc because that meant barriers could be lost when running aigmap.

* This acts as an optimization barrier, limiting the rewriting allowed
* For now this forbids all optimization, but attributes can be added to
  opt in to specific optimizations as required
* This uses $barrier optimization barriers to connect wires into the
  flattened module instead of connections
* This can optionally ignore rewriting the outputs of cells or processes
* This by default rewrites drivers of wires with public names but can
  also optionally rewrite drivers of wires with private names
* A -remove flag allows cleaning up the design by replacing barriers
  with connections
* This uses optbarriers and flatten -barriers to insert barriers on
  public wires in the design, limiting the optimization that can affect
  them
* this is needed to handle $barrier cells which otherwise are similar to
  $buf but do not have a sign parameter
* this was added earlier in the PR with the intention of making
  equiv_opt work, but it means barriers are lost through passes like
  aigmap and can potentially be used in optimization passes so it is better to
  remove it
@georgerennie georgerennie force-pushed the george/optimization_barriers branch from 3bcca55 to 24a9cbe Compare July 15, 2025 15:37
@jix jix self-assigned this Sep 1, 2025
@jix jix self-requested a review September 1, 2025 14:30
@jix
Copy link
Member

jix commented Sep 1, 2025

@georgerennie since you've kept working on this PR on and off, I'm wondering if there are any parts left you consider still unfinished or that you'd like to discuss before any of this getting merged? (Apart from resolving the merge conflicts, of course.)

I also still need to perform full review, but just from what we've discussed before and going by the commit messages, I'm wondering if it makes sense to split this PR into multiple parts so we can merge the core functionality, a minimal pass to add and remove barriers and the changes required for passes to respect the barriers first and merge more fine grained user control over barriers, backend support and attributes to represent weaker barriers in one or many follow up PRs.

@georgerennie
Copy link
Collaborator Author

I think this is all ready to be merged in its current state (modulo the merge conflicts). I'm happy to break it up into a few PRs if that makes it easiest to review. Will close it and open PRs for the constituent parts (although I guess these will be stacked at least on the kernel changes but I can rebase once those land).

@jix
Copy link
Member

jix commented Sep 8, 2025

I don't mind closing this PR as soon as the constituent parts have their own PRs, but until then I'd prefer to keep this open so it doesn't disappear from my assignee:@me is:open type:issue,pr org:YosysHQ bookmark.

@jix jix reopened this Sep 8, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants